Generalizing Boundary Points
نویسندگان
چکیده
The complexity of numerical domain partitioning depends on the number of potential cut points. In multiway partitioning this dependency is often quadratic, even exponential. Therefore, reducing the number of candidate cut points is important. For a large family of attribute evaluation functions only boundary points need to be considered as candidates. We prove that an even more general property holds for many commonly-used functions. Their optima are located on the borders of example segments in which the relative class frequency distribution is static. These borders are a subset of boundary points. Thus, even less cut points need to be examined for these functions. The results shed a new light on the splitting properties of common attribute evaluation functions and they have practical value as well. The functions that are examined also include non-convex ones. Hence, the property introduced is not just another consequence of the convexity of a function. Introduction Fayyad and Irani (1992) showed that the Average Class Entropy and Information Gain functions (Quinlan 1986) obtain their optimal values for a numerical value range at a boundary point. Intuitively it means that these functions do not needlessly separate instances of the same class. The result reveals interesting fundamental properties of the functions, and it can also be put to use in practice: only boundary points need to be examined as potential cut points to recover the optimal binary split of the data. Recently the utility of boundary points has been extended to cover other commonly-used evaluation functions and optimal multisplitting of numerical ranges (Elomaa and Rousu 1999). Other recent studies concerning the splitting properties of attribute evaluation functions include Breiman’s (1996) research of the characteristics of ideal partitions of some impurity functions and Codrington and Brodley’s (2000) study of the general requirements of well-behaved splitting functions. Similar research lines for nominal attributes are followed by Coppersmith, Hong, and Hosking (1999). This paper continues to explore the splitting properties of attribute evaluation functions. We introduce a generalCopyright c 2000, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. ized version of boundary points—the so-called segment borders—which exclude all cut points in the numerical range that separate subsets of identical relative class frequency distributions. The separated subsets do not need to be class uniform to warrant the exclusion, as is the case with boundary points. We show that it suffices to examine segment borders in optimizing the value of the best-known attribute evaluation functions. Hence, the changes in class distribution, rather than relative impurities of the subsets, define the potential locations of the optimal cut points (cf. López de Màntaras 1991). Two of the examined functions are non-convex. Hence, the property of splitting on segment borders is not only a consequence of the convexity of a function. A partition Uk i=1 Si of the sample S consists of k nonempty, disjoint subsets and covers the whole domain. When splitting a set S of examples on the basis of the value of an attribute A, there is a set of thresholds fT1; : : : ; Tk 1g Dom(A) that defines a partition Uk i=1 Si for the sample in an obvious manner: Si = ( fs 2 S j valA(s) T1g if i = 1, fs 2 S j Ti 1 < valA(s) Tig if 1 < i < k, fs 2 S j valA(s) > Tk 1g if i = k, where valA(s) denotes the value of attributeA in example s. The classification of an example s is its value for the class attribute C, valC(s). Next section recapitulates boundary points and introduces example segments. Then we prove that six well-known functions do not partition within a segment. We also explore empirically the average numbers of boundary points and segment borders in 28 UCI data sets. Finally, we relate our results to those of Breiman (1996) and outline further research directions.
منابع مشابه
conformal invariance : Two - dimensional clusters grafted to wedges , cones , and branch points of Riemann surfaces
Lattice animals are one of the few critical models in statistical mechanics violating conformal invariance. We present here simulations of 2-d site animals on square and triangular lattices in non-trivial geometries. The simulations are done with the newly developed PERM algorithm which gives very precise estimates of the partition sum, yielding precise values for the entropic exponent θ (ZN ∼ ...
متن کاملViolating conformal invariance: two-dimensional clusters grafted to wedges, cones, and branch points of Riemann surfaces.
Lattice animals are one of the few critical models in statistical mechanics violating conformal invariance. We present here simulations of two-dimensional site animals on square and triangular lattices in nontrivial geometries. The simulations are done with the pruned-enriched Rosenbluth method (PERM) algorithm, which gives very precise estimates of the partition sum, yielding precise values fo...
متن کاملFast and strongly localized observation for the Schrödinger equation
We study the exact observability of systems governed by the Schrödinger equation in a rectangle with homogeneous Dirichlet (respectively Neumann) boundary conditions and with Neumann (respectively Dirichlet) boundary observation. Generalizing results from Ramdani, Takahashi, Tenenbaum and Tucsnak (2005), we prove that these systems are exactly observable in in arbitrarily small time. Moreover, ...
متن کاملA Simple and Systematic Approach for Implementing Boundary Conditions in the Differential Quadrature Free and Forced Vibration Analysis of Beams and Rectangular Plates
This paper presents a simple and systematic way for imposing boundary conditions in the differential quadrature free and forced vibration analysis of beams and rectangular plates. First, the Dirichlet- and Neumann-type boundary conditions of the beam (or plate) are expressed as differential quadrature analog equations at the grid points on or near the boundaries. Then, similar to CBCGE (direct ...
متن کاملSome points on generalized open sets
The paper is an attempt to represent a study of limit points, boundary points, exterior points, border, interior points and closure points in the common generalized topological space. This paper takes a look at the possibilities of an extended topological space and it also considers the new characterizations of dense set.
متن کاملThe Maslov Index Revisited
Abs t r ac t . Let 7) be a Hermitian symmetric space of tube type, S its Shilov boundary and G the neutral component of the group of bi-holomorphic diffeomorphisms of 7). In the model situation 7) is the Siegel disc, S is the manifold of Lagrangian subspaces and G is the symplectic group. We introduce a notion of transversality for pairs of elements in S, and then study the action of G on the s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000